Annotating Uncertainty in Hungarian Webtext
نویسندگان
چکیده
Uncertainty detection has been a popular topic in natural language processing, which manifested in the creation of several corpora for English. Here we show how the annotation guidelines originally developed for English standard texts can be adapted to Hungarian webtext. We annotated a small corpus of Facebook posts for uncertainty phenomena and we illustrate the main characteristics of such texts, with special regard to uncertainty annotation. Our results may be exploited in adapting the guidelines to other languages or domains and later on, in the construction of automatic uncertainty detectors.
منابع مشابه
Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation
In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their pr...
متن کاملDetecting Uncertainty Cues in Hungarian Social Media Texts
In this paper, we aim at identifying uncertainty cues in Hungarian social media texts. We present our machine learning based uncertainty detector which is based on a rich features set including lexical, morphological, syntactic, semantic and discourse-based features, and we evaluate our system on a small set of manually annotated social media texts. We also carry out cross-domain and domain ada...
متن کاملAnnotating Errors in a Hungarian Learner Corpus
We are developing and annotating a learner corpus of Hungarian, composed of student journals from three different proficiency levels written at Indiana University. Our annotation marks learner errors that are of different linguistic categories, including phonology, morphology, and syntax, but defining the annotation for an agglutinative language presents several issues. First, we must adapt an ...
متن کاملUncertainty Detection in Hungarian Texts
Uncertainty detection is essential for many NLP applications. For instance, in information retrieval, it is of primary importance to distinguish among factual, negated and uncertain information. Current research on uncertainty detection has mostly focused on the English language, in contrast, here we present the first machine learning algorithm that aims at identifying linguistic markers of unc...
متن کاملLessons learned from tagging clinical Hungarian
As more and more textual resources from the medical domain are getting accessible, automatic analysis of clinical notes becomes possible. Since part-of-speech tagging is a fundamental part of any text processing chain, tagging tasks must be performed with high accuracy. While there are numerous studies on tagging medical English, we are not aware of any previous research examining the same fiel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014